Skip to content

{2023.06}[2023b] Ginkgo 1.9.0 #1106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

pratikvn
Copy link

This PR adds Ginkgo to the easystack. Please feel free to give feedback.

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@ocaisa
Copy link
Member

ocaisa commented May 30, 2025

The update is good, but please target it to https://github.com/EESSI/software-layer/blob/2023.06-software.eessi.io/easystacks/software.eessi.io/2023.06/eessi-2023.06-eb-5.1.0-2023b.yml

@ocaisa
Copy link
Member

ocaisa commented May 30, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:aarch64/neoverse_v1

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented May 30, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:aarch64/neoverse_v1 from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:aarch64/neoverse_v1
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:aarch64/neoverse_v1 resulted in:

    • no jobs were submitted

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented May 30, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_v1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.05/pr_1106/66256

date job status comment
May 30 09:12:40 UTC 2025 submitted job id 66256 awaits release by job manager
May 30 09:13:32 UTC 2025 released job awaits launch by Slurm scheduler
May 30 09:19:34 UTC 2025 running job 66256 is running
May 30 09:29:45 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-66256.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-17485969150.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/neoverse_v1/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/neoverse_v1
no other files in tarball
May 30 09:29:45 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:aarch64_neoverse_v1+default
P: perf: 984.188 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_neoverse_v1+default
P: perf: 983.441 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 3.16 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 3.08 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 4.3 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 4.43 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 0.41 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 0.44 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:aarch64_neoverse_v1+default
P: bandwidth: 35960.02 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:aarch64_neoverse_v1+default
P: bandwidth: 36361.26 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-66256.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@ocaisa
Copy link
Member

ocaisa commented May 30, 2025

@pratikvn The build is failing in the configure step due to being in a Slurm job context and not having enough slots. I'll allow oversubscription to push the MPI tests through. In case you want to crosscheck, the general configuration looked like:

---------------------------------------------------------------------------------------------------------
--
--    Summary of Configuration for Ginkgo (version 1.9.0 with tag main)
--
--    Ginkgo configuration:
--        CMAKE_BUILD_TYPE:                                Release
--        BUILD_SHARED_LIBS:                               ON
--        CMAKE_INSTALL_PREFIX:                            /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/Ginkgo/1.9.0-gompi-2023b
--        PROJECT_SOURCE_DIR:                              /tmp/bot/easybuild/build/Ginkgo/1.9.0/gompi-2023b/ginkgo-1.9.0
--        PROJECT_BINARY_DIR:                              /tmp/bot/easybuild/build/Ginkgo/1.9.0/gompi-2023b/easybuild_obj
--        CMAKE_CXX_COMPILER:                              GNU 13.2.0 on platform Linux aarch64
--                                                         /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/OpenMPI/4.1.6-GCC-13.2.0/bin/mpicxx
--    User configuration:
--      Enabled modules:
--        GINKGO_BUILD_OMP:                                ON
--        GINKGO_BUILD_MPI:                                ON
--        GINKGO_BUILD_REFERENCE:                          ON
--        GINKGO_BUILD_CUDA:                               OFF
--        GINKGO_BUILD_HIP:                                OFF
--        GINKGO_BUILD_SYCL:                               OFF
--      Enabled features:
--        GINKGO_MIXED_PRECISION:                          OFF
--        GINKGO_HAVE_GPU_AWARE_MPI:                       OFF
--        GINKGO_ENABLE_HALF:                              ON
--      Tests, benchmarks and examples:
--        GINKGO_BUILD_TESTS:                              ON
--        GINKGO_FAST_TESTS:                               ON
--        GINKGO_BUILD_EXAMPLES:                           ON
--        GINKGO_EXTLIB_EXAMPLE:
--        GINKGO_BUILD_BENCHMARKS:                         ON
--        GINKGO_BENCHMARK_ENABLE_TUNING:                  OFF
--      Documentation:
--        GINKGO_BUILD_DOC:                                OFF
--        GINKGO_VERBOSE_LEVEL:                            1
--
---------------------------------------------------------------------------------------------------------
--
--      Developer Tools:
--        GINKGO_DEVEL_TOOLS:                              OFF
--        GINKGO_WITH_CLANG_TIDY:                          OFF
--        GINKGO_WITH_IWYU:                                OFF
--        GINKGO_CHECK_CIRCULAR_DEPS:                      OFF
--        GINKGO_WITH_CCACHE:                              ON
---------------------------------------------------------------------------------------------------------
--
--      Components:
--        GINKGO_BUILD_PAPI_SDE:                           OFF
--        GINKGO_BUILD_HWLOC:                              OFF
--
--  Detailed information (More compiler flags, module configuration) can be found in detailed.log

@boegel boegel added the EuroHPC label Jun 11, 2025
@EESSI EESSI deleted a comment from eessi-bot-aws bot Jun 11, 2025
@ocaisa
Copy link
Member

ocaisa commented Jun 11, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:aarch64/neoverse_v1

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jun 11, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_v1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.06/pr_1106/68890

date job status comment
Jun 11 16:07:24 UTC 2025 submitted job id 68890 awaits release by job manager
Jun 11 16:08:15 UTC 2025 released job awaits launch by Slurm scheduler
Jun 11 16:09:24 UTC 2025 running job 68890 is running
Jun 11 16:18:03 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-68890.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-17496582320.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/neoverse_v1/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/neoverse_v1
no other files in tarball
Jun 11 16:18:03 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:aarch64_neoverse_v1+default
P: perf: 975.637 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_neoverse_v1+default
P: perf: 935.688 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 3.02 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 3.15 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 4.29 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 4.3 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 0.4 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:aarch64_neoverse_v1+default
P: latency: 0.39 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:aarch64_neoverse_v1+default
P: bandwidth: 32599.85 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:aarch64_neoverse_v1+default
P: bandwidth: 34163.12 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-68890.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@ocaisa
Copy link
Member

ocaisa commented Jun 12, 2025

@pratikvn Ginkgo is a little too clever and is querying the number of available slots and filtering tests as a result. This is leading to CMake errors:

{EESSI 2023.06} [ocaisa@aarch64-neoverse-v1-node1 ~]$ grep -A 3 -i error /tmp/eb-0mam__wq/eb-6_jzs3dk/run-shell-cmd-output/cmake-_qwfwuu5/out.txt
CMake Error at cmake/create_test.cmake:74 (set_property):
  set_property given TEST names that do not exist:

    test/mpi/assembly_omp
--
CMake Error at cmake/create_test.cmake:74 (set_property):
  set_property given TEST names that do not exist:

    test/mpi/matrix_omp
--
CMake Error at cmake/create_test.cmake:74 (set_property):
  set_property given TEST names that do not exist:

    test/mpi/partition_helpers_omp
--
CMake Error at cmake/create_test.cmake:74 (set_property):
  set_property given TEST names that do not exist:

    test/mpi/vector_omp
--
CMake Error at cmake/create_test.cmake:74 (set_property):
  set_property given TEST names that do not exist:

    test/mpi/preconditioner/schwarz_omp
--
CMake Error at cmake/create_test.cmake:74 (set_property):
  set_property given TEST names that do not exist:

    test/mpi/solver/solver_omp
--
CMake Error at cmake/create_test.cmake:74 (set_property):
  set_property given TEST names that do not exist:

    test/mpi/multigrid/pgm_omp
--
-- Configuring incomplete, errors occurred!

Can I force it to not do the filtering?

@ocaisa
Copy link
Member

ocaisa commented Jun 12, 2025

I think I have found a workaround by adding -DMPIEXEC_MAX_NUMPROCS=8

@ocaisa
Copy link
Member

ocaisa commented Jun 12, 2025

@pratikvn I managed to get Ginkgo to compile for Neoverse_V1 but I am seeing a single failing test:

[ RUN      ] Cgs/std::complex<float>.SolvesDenseSystemMixedComplex
/tmp/ocaisa/easybuild/build/Ginkgo/1.9.0/gompi-2023b/ginkgo-1.9.0/reference/test/solver/cgs_kernels.cpp:345: Failure
Relative error between x and {value_type{-4.0, 8.0}, value_type{-1.0, 2.0}, value_type{4.0, -8.0}} is 0.00026025000584008542
        which is larger than (r_mixed<value_type, TypeParam>() * 1e2) (which is 0.0001685857682787173)
x is:
        (-4.0013704299926758,7.9991722106933594)
        (-0.99788099527359009,2.0012552738189697)
        (3.9986329078674316,-8.0008230209350586)
{value_type{-4.0, 8.0}, value_type{-1.0, 2.0}, value_type{4.0, -8.0}} is:
        (-4,8)
        (-1,2)
        (4,-8)
component-wise relative error is:
        0.00017900116972431731
        0.0011013569957268345
        0.00017840379936823707


[  FAILED  ] Cgs/std::complex<float>.SolvesDenseSystemMixedComplex, where TypeParam = std::complex<float> (0 ms)

How concerned should I be about that? If I shouldn't be, how do I skip that test?

@pratikvn
Copy link
Author

Hi @ocaisa , thank you for looking into this.

  1. I think the MPIEXEC_MAX_NUMPROCS workaround should be sufficient for now. As Tobias mentions here: Test filtering based on available MPI ranks causing CMake errors ginkgo-project/ginkgo#1865 (comment), the proper fix would be to add a return() statement, if you would prefer to have a more concrete fix.
  2. I think the test failure issue should be safe to ignore. CGS as an algorithm is a bit unstable, and with mixed-precision can have numerical issues particularly with different parallel architectures.

@ocaisa
Copy link
Member

ocaisa commented Jun 12, 2025

@pratikvn We did a reorganisation of this repo last night, so I am going to replace this PR so that we can proceed. I got a working recipe (at least for Neoverse V2) at easybuilders/easybuild-easyconfigs#23078

@ocaisa
Copy link
Member

ocaisa commented Jun 12, 2025

Replacing this by #1127

@ocaisa ocaisa closed this Jun 12, 2025
@boegel
Copy link
Contributor

boegel commented Jun 12, 2025

Just changing the target branch to main should've been sufficient BTW, there's no need to open another PR (but fine, that's been done, so proceed that way)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants